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Abstract — Compressive sensing is a methodology for the re- 
construction of sparse or compressible signals using far fewer 
samples than required by the Nyquist criterion. However, many 
of the results in compressive sensing concern random sampling 
matrices such as Gaussian and Bernoulli matrices. In common 
physically feasible signal acquisition and reconstruction scenarios 
such as super-resolution of images, the sensing matrix has a 
non-random structure with highly correlated columns. Here we 
present a compressive sensing recovery algorithm that exploits 
this correlation structure. We provide algorithmic justification as 
well as empirical comparisons. 

I. Introduction 

Consider the problem of image super-resolution, where one 
or more low-resolution images of a scene are used to synthe- 
size a single image of higher resolution. If multiple images are 
used, they are commonly assumed to be subpixel-shifted and 
downsampled versions of the original high resolution image 
that is to be reconstructed fil. Alternatively, super-resolution 
from a single low resolution image using a dictionary of image 
patches and compressive sensing recovery has been proposed 
in (2). The relationship between the available low resolution 
and desired high resolution image is commonly modeled by 
a linear filtering and downsampling operation. Suppose that 
we wish to reconstruct a size N x N high resolution image 
from a lower resolution image, for example of size y x ^, 
or smaller. Let x and y represent the vectorized high and low 
resolution images respectively. We model the formation of y 
from x by the equation y = SHx + r\ where r\ is the sensor 
noise, S is a downsampling matrix of size y * t by N 2 , 
and H is a N 2 by A 2 matrix that represents the filtering 
(antialiasing) operation. In order to consider super-resolution 
as a compressive sensing recovery problem we write x = ^>c 
where 'k is a sparsifying basis for the class of images under 
consideration and c is the coefficient vector corresponding to 
image x with respect to the basis >k. In the simplest case, ^ 
is an N 2 x A 2 orthogonal matrix, but can also be generalized 
to an overcomplete dictionary. 

We have 

y = SH^c + ?/ = $c + ?/, 

where $ = SH^> is the sampling matrix. 
Most of the work in the compressive sensing literature 
assumes <E> to be random matrix, such as a partial DFT or one 



drawn from a Gaussian or Bernoulli distribution. However, 
in this scenario the matrix is not random, but instead has 
correlated columns whose structure we wish to exploit to 
improve compressive sensing recovery. Here we assume that 
H is not a perfect low pass filter, so that it is possible for 
<I> = SH*i? to preserve enough high frequency information for 
recovery to be possible; SH and >k have sufficient incoherency 
to allow c to be recovered with acceptable error. 

Compressed sensing provides techniques for stable sparse 
recovery J3)-||5], but results for coherent sensing matrices have 
been limited |6|, [7|. 

Organization. The structure we wish to exploit is first 
described. Then we present algorithms that take advantage of 
this structure for compressive sensing recovery. 

II. Correlation Structure 

Typical examples of sparsifying bases ^ for images are 
wavelets and blockwise discrete cosine transform bases. Im- 
ages exhibit correlation at each scale: neighboring pixels 
are heavily correlated except across edges, local averages of 
neighboring blocks are heavily correlated except across edges, 
and so on. This makes wavelet-like bases, which have locally 
restricted atoms, suitable for sparsifying the image. For the 
super-resolution setting with the low resolution image of size 
Y x y, the rows of SH consist of shifted versions of the 
filtering kernel with shifts of 2 horizontally and vertically. 
Due to the localized nature of wavelet bases, we expect 
columns of $ that correspond to spatially distant bases in ^ 
to have little correlation. If $ is a tree structured orthogonal 
wavelet basis matrix, columns of ^ that overlap spatially 
are orthogonal, however when filtered by H, they result in 
significant correlation. Then we expect columns in <E> to show 
significant correlation in tree structured patterns. 

We illustrate this with an example. For simplicity we 
consider only one-dimensional signals, though the discussion 
is equally valid for images. Suppose that is a 256 x 256 
matrix whose columns consist of the length 256 Haar basis 
vectors, and SH is a 128 x 256 matrix obtained by shifting 
the filter kernel h = {0.1, 0.2, 0.4, 0.2, 0.1} by two from one 
row to the next. SH represents the filtering and downsampling 
operation that generates the low resolution signal y = SHx 
from the length 256 signal x. Then $ = SH^ is the sampling 
matrix. 



Fig. [T] shows the absolute values of the correlation matrix 
C = (here and throughout A* denotes the adjoint of A). 
This shows that only a small number of pairs of columns of 
$ are strongly correlated to each other. Each filtered wavelet 
basis is correlated with other spatially overlapping bases at 
coarser and finer scale and in the immediate neighborhood, 
but has no correlation with spatially distant bases. 
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Fig. 1: Absolute values of 

More generally, consider compressive sensing recovery 
where the columns of the sampling matrix $ can be grouped 
into nearly-isolated sets, such that correlation among pairs 
of columns within a set may be significant, but correlation 
between two columns that belong to different sets is relatively 
small. How does one exploit this structure to efficiently 
reconstruct the signal? 

One of the central results in compressive sensing is that if 
matrix $ exhibits a property called the Restricted Isometry 
Property (RIP) [8|-[10|, convex optimization can recover the 
sparse signal exactly||TT|, p"2| via 



min ||c||, such that y = <E>c. 



(1) 



However, the sampling matrix $ = described above 
does not obey the RIP and these results are not readily appli- 
cable. On the other hand, it is commonly found in practical 
applications and has a structure that could be exploited. 

Before considering the above problem, a simple modi- 
fication to CoSaMP fl3) is presented that provides some 
improvement in recovery performance. This algorithm, called 
Partial Inversion (Partlnv) and described by Algorithm [T] also 
indicates how the above described structure could be exploited. 

III. Partial Inversion 

Consider the usual CS setting: Given a length M sample 
vector y — $c + r\ where $ is an M x N sampling matrix and 
c a length N vector with sparsity K < M, we wish to obtain 
the best TT-sparse approximation c to c. At each step let I be 
an index set, so that for example, cj represents an estimate of 
the components of c corresponding to the column indices in 
I. c by itself is an estimate for all the columns {1..N}. Let 
L for K < L < M be an adjustable parameter for the size 
of the set /. We get good results with L = max{K, 0.8M}. 
Let <!>/ denote the matrix of columns from $ corresponding to 
indices in set the /. Let / = {l..iV}\7 denote the complement 
of /. For any full rank matrix A, define A t = (A* A)" 1 A*. 



For the noiseless case 77 = 0, the stopping condition can 
be obtained by testing the magnitude of T2 = y — $c at the 
start of each iteration. If set I does not vary from one iteration 
to the next, the algorithm cannot progress further and can be 
stopped immediately. In practice the inversion of line 3 can be 
done efficiently by Richardson's algorithm (see e.g. Sec. 7.2 
of (l4)). If |j( fc '| < M, we may replace line 6 by Cj(k) 

This algorithm demonstrates improvement relative to 
CoSaMP when the accurate recovery region is considered on 



a plot of jj versus The motivation is the following (for 
simplicity we drop the iteration indicator k) : From line 3, 



ci = ®\y 



Cl + ($|$ / )- 1 $l$ f cj. 



(2) 
(3) 



Compare this to the estimator ci = used in CoSaMP. 
When r = y, we have 

cj = <£}y (4) 
= $J$ /C/ + $*$ JC/ (5) 
= cj + ($j$r - I)ci + $j$/cj (6) 

If the index set I contains several nonzero coefficients 
(which we hope is true), then — I)cj, which results 

from the mutual interference between the columns of <£>/, 
is significant and is a source of noise in cj. This term is 
eliminated in Partial inversion does add to the 

remaining noise term, however, the singular values of this term 
can be kept from significantly amplifying the noise term by 
a conservative choice of L, the size of the index set I. The 
improved estimate cj further produces an improved estimate 
Cj(fc), which leads to a better selection of nonzero coefficients 
in the next iteration. 

The expression Q also indicates how the correlation 
structure may be used to improve recovery. The noise term 
($*(j) / )- 1 $*(j)- c ~ depends upon the correlation between the 
sets and <I>j given by This correlation is weak if 

and are sufficiently spread. However, the correlation is 




Fig. 2: Proportion of successes on Gaussian matrices using (a) Part- 
Inv, (b) CoSaMP and (c) l\ -minimization, and proportion of successes 
on correlated column subset matrices using (d) Partlnv, (e) CoSaMP 
and (f) l\ -minimization for various values of S — 4| £ (0,1) 
(horizontal axis) and p = |y G (0, 1) (vertical axis). 



likely to remain large if L is significant compared to M, as 
will be the case when ~ is large. 

IV. Experimental Comparison 

We compare the recovery performance of Partial Inversion 
with CoSaMP and convex optimization Q for two classes of 
matrices: Gaussian random matrices, and matrices constructed 
to have highly correlated subsets of columns with low corre- 
lation across subsets. 

In the first case, we construct M by TV matrices with 
N(0, 1) elements along with the coefficient vector c containing 
K nonzero entries taken from a N(0, 1) distribution. The 
nonzero locations are selected uniformly at random from 
{1...N}. Each column in each matrix is normalized to have 
unit li norm. We set N — 256 and vary S — M- from 0.1 
to 0.9 in steps of 0.1. For each S we vary p — jj from 0.1 
to 0.9 in steps of 0. 1 . For each (5, p) point we carry out 25 
trials, and declare success if -h\\c — c|| 2 < 10 -5 . For Partlnv 
we considered two cases for the size of subset I : L = S and 
L = max{S, 0.8A/}. We see better performance in the L = S 
case. For li minimization we use the ^-magic package [15]. 
We show the results in Fig. [2] 

In the second case, we construct M by N matrices with 
N = 256 and variable M and a block diagonal structure. 
The columns are divided into 16 column subsets. In each 
subset we set M/16 rows to 1. In addition, to every element 
of the matrix we add noise drawn from a zero-mean normal 



distribution with variance 0.0025. This produces heavy intra - 
subset correlation and light correlation across subsets. We let 
the coefficient vector c contain S nonzeros elements drawn 
from a N(0, 1) distribution. We select 4 of the 16 subsets at 
random and in each subset select f of the indices to have 
nonzero values, again uniformly at random. If some of the 
nonzeros were left over, they are accomodated in a fifth subset. 
For Partlnv we set L = max{5, 0.8M}. The results are also 
depicted in Fig. [2] 

V. Recovery of Coefficients Concentrated on 
Wavelet Trees 

We next use Partial Inversion to recover nonzero coeffi- 
cients that are concentrated on wavelet trees, which is com- 
monly seen when a signal or image with discontinuities is 
decomposed in a wavelet basis. When the coefficients are 
concentrated on an isolated set (a set of columns that have low 
correlation with columns outside the set), a setwise estimator is 
especially useful to identify the sets on which the coefficients 
are nonzero. Consider the 2D wavelet case. Suppose that / 
is the index set of columns of the wavelet basis belonging to 
a particular tree rooted at a coarse scale and containing finer 
scale coefficients. We have 

zi = $}y = $}$/c 7 + $J$ jCf . (7) 

Because $/ is relatively isolated from the columns in $j, 
the second term is small, and because most of the elements of 
cj are nonzero, the first term is large. This is further intensified 
by the mutual correlation of the columns of which is high 
because of the spatial overlap of the support of the wavelet 
bases in the tree. This motivates a simple selection criterion 
for measuring the strength of the nonzero coefficients in each 
wavelet tree /: sj = V^ \zj\. We use this criterion along with 

Partlnv to select wavelet trees that are known to be nonzero. 
We denote the number of subsets by SETNUM. 

We modify the Partlnv algorithm to use this estimator. 

VI. Experimental Results 

To test this algorithm, we use the Daubechies-5 wavelet 
basis in two dimensions over 32 x 32 size patches with 5 
levels of decomposition. This gives a size 1024 by 1024 matrix 
"J. We divide this matrix into 49 sets: 1 set of the coarsest 
scale coefficients in a block of size 4x4 containing the two 
coarsest scales, and 48 other sets rooted at the coefficients at 
the next finer scale. Each of these sets contains 21 (1 + 4+16) 
coefficients in a quadtree structure. To create matrix $ we 
first apply a blurring filter H with a symmetric 3x3 kernel 
that is close to a delta function. This simulates practical 
optical sampling acquisition effects such as diffraction and 
helps prevent rank deficiency problems when carrying out 
inversion. We use different 2D sampling patterns to carry out 
the subsampling operation represented by matrix S. Hence 
the acquisition process is represented by y = $c where 
<f> = SHfy. The sampling patterns are shown in Table [I] 
for each sampling rate S = 4C used to generate the results. 



Algorithm 2 Given y = <£>c, with K nonzero coefficients con- 
centrated on wavelet trees,return best if -sparse approximation 

c 

1: c <- 5>*y; 

2: k i 1 

3: for j = 1 -> SETNUM do 

4: Sj «- Y, fill 

5: end for 

6: / fe + 1 <— indices of columns contained in the sets with the 
largest magnitude s&, to include at least K coefficients. 

k <- k + 1 

while Stopping condition not met do 



4> 



r <-y 
j(fe) ^ >w 

P 4- [Id - $ z 
c jW 4- P*r 
Repeat lines 3 
fe 4— fc + 1 
end while 



$!( fc) ]$j<*> 



Each pattern is replicated 8 times in horizontal and vertical 
directions to give the 32 x 32 sampling pattern used for matrix 
S. The filter kernel is a 3 x 3 kernel with 0.29 at the center 
and 0.02 in other locations. 
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TABLE I: Sampling Patterns 



The results are shown in Fig. [3] For each data point we carry 
out 100 trials. We declare success if i. ] jc — c|| 2 < 10~ 5 where 
N = 32 x 32. This shows improvement in selection perfor- 
mance with the sum estimator. This estimator is much simpler 
than the Condensing Sort and Select Algorithm (CSSA) used 
in p6) to incorporate wavelet tree structure in CoSAMP but 
gives better results. 

VII. Conclusion 

We consider methods of compressive sensing recovery for 
sampling matrices that have subsets of columns that are 




Fig. 3: Proportion of successes out of 100 trials with nonzero coeffi- 
cients concentrated on wavelet trees. 



strongly intra-correlated, but show low correlation with other 
subsets. This structure commonly arises in physical sam- 
ple acquisition/reconstruction scenarios such as image super- 
resolution. We describe Partial Inversion, an algorithm that im- 
proves compressive sensing recovery by removing a source of 
noise in the initial estimator, and demonstrate its performance 
by simulations on Gaussian and correlated column subset 
matrices. We consider compressive sensing recovery when the 
nonzero coefficients are concentrated on wavelet trees, and 
demonstrate a simple estimator that improves selection of the 
trees that carry the nonzero coefficients. 
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